A random variable \(Y\) is a
function that maps outcomes from the sample space
\(S\) of a random experiment to real
numbers.
If \(Y\) only takes integer values, it is called a discrete random variable.
The probability mass function (PMF) is denoted by \(P(Y = x),\ x \in \mathbb{N}\).
The cumulative distribution function (CDF) is denoted by \(P(Y \leq x),\ x \in \mathbb{N}\).
The expectation and variance of \(Y\) are denoted by \(E(Y)\) and \(\text{Var}(Y)\), respectively.
A random sample of size \(n\) from \(Y\) is denoted as:
\[ Y_1, Y_2, \dots, Y_n. \]
A statistic is any function of the sample, such as \(\bar{Y}_n\) which estimates the population mean \(E(Y)\).
An estimate is a statistic used to infer the value of an unknown parameter.
\(P(X = k) = \frac{2k - 1}{16}, \quad k = 1, 2, 3, 4.\)
期望值 (Expectation): \(E(X) = \frac{25}{8} \approx 3.125.\)
變異數 (Variance): \(\text{Var}(X) = \frac{55}{64} \approx 0.8594,\ \text{SD}(X) \approx 0.927.\)
從此分佈抽取 \(n\) 個樣本 \(x_1, x_2, \dots, x_n\),可以計算以下 7 個統計量:
機率估計: \(\hat{P}(X = k) = \frac{1}{n} \sum_{i=1}^{n} I_{\{x_i = k\}}, \quad k = 1,2,3,4.\)
樣本平均 (Sample Mean):\(\bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i.\)
樣本變異數 (Sample Variance):\(s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2.\)
卡方統計量 (Chi-Square Statistic):(綜合衡量機率估計與預期機率的差距) \[ \chi^2 = \sum_{k=1}^{4} \frac{(o_k - E_k)^2}{E_k}, \]
\(o_k\) 是第 \(k\) 種結果的觀測次數 \(n \hat{P}(X = k)\),
\(E_k\) 是第 \(k\) 種結果的期望次數 \(n P(X = k)\).
先撰寫一個 視覺化版本(非互動式),觀察樣本數變化時, 模擬統計量(如平均、標準差)與理論值之間的 誤差趨勢。
將上述內容擴充為 Shiny App 的互動版本,讓使用者可以自由調整樣本數與骰子權重, 並即時觀察絕對差的變化。
\(Y_1, \cdots, Y_n\) is a random
sample of \(Y\). Hence, \(h(Y_1, \cdots, Y_n)\) is a random
variable,
where \(h\) is a function that maps the
sample to a real number.
\(\hat{P}(X = k) = \frac{1}{n} \sum_{i=1}^{n} I_{\{X_i = k\}}, \quad k = 1, 2, 3, 4.\)
\(\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i.\)
\(S^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2.\)
Usually, it is not easy to derive the distribution of \(h(Y_1, \cdots, Y_n)\), even if we know the distribution of \(Y\).
If it is not feasible to derive the distribution analytically, we can approximate it by repeating the sampling process \(n_rep\) times, namely generating \(n_rep\) values of \(h(Y_1, \cdots, Y_n)\).
以下是在\(n = 50, 100, 500, 1000\) 時,生成上述4個機率估計、樣本平均、樣本變異數、樣本標準差,共 \(n\_rep = 1000\) 筆。每一筆或每一列的值,皆來自相同模擬數據所產生。
## 'data.frame': 4000 obs. of 8 variables:
## $ n : num 50 50 50 50 50 50 50 50 50 50 ...
## $ max_1 : num 0.04 0.08 0.06 0.06 0.08 0.1 0.1 0.08 0.04 0.06 ...
## $ max_2 : num 0.22 0.12 0.1 0.2 0.08 0.2 0.14 0.22 0.18 0.3 ...
## $ max_3 : num 0.32 0.4 0.34 0.36 0.28 0.18 0.34 0.22 0.32 0.28 ...
## $ max_4 : num 0.42 0.4 0.5 0.38 0.56 0.52 0.42 0.48 0.46 0.36 ...
## $ max_mean: num 3.12 3.12 3.28 3.06 3.32 3.12 3.08 3.1 3.2 2.94 ...
## $ max_var : num 0.802 0.842 0.777 0.833 0.875 ...
## $ max_sd : num 0.895 0.918 0.882 0.913 0.935 ...
Weak Law of Large Numbers (WWLN) Let \(Y_1, Y_2, \ldots\) be i.i.d. random
variables with \(E(Y) <
\infty\).
Then for any \(\varepsilon > 0\),
\[
\lim_{n \to \infty} P\left( \left| \bar{Y}_n - E(Y) \right| \geq
\varepsilon \right) = 0,
\] denoted by \[
\bar{Y}_n \xrightarrow{p} E(Y).
\]
Properties of the Sample Mean \(E(\bar{Y}_n) = E(Y)\) (unbiased estimator) and \(\text{Var}(\bar{Y}_n) = \frac{1}{n}\text{Var}(Y)\).
\(\hat{p}_k = \frac{1}{n} \sum_{i=1}^{n} I_{\{X_i = k\}}\xrightarrow{p} P(X = k).\) (for \(k = 1, 2, 3, 4\))
\(\text{Var}(\hat{p}_k) = \frac{1}{n} p_k (1 - p_k)\).
Central Limit Theorem (CLT):
Let \(Y_1, Y_2, \cdots\) be i.i.d.
random variables with \(E(Y) <
\infty\) and \(\text{Var}(Y) <
\infty\). Then, for all \(z \in
\mathbb{R}\), \[
\lim_{n \rightarrow \infty} P\left( \frac{\sqrt{n}(\bar{Y}_n -
E(Y))}{\sqrt{\text{Var}(Y)}} \leq z \right) = \Phi(z),
\] which is is often denoted as: \[
\frac{\sqrt{n}(\bar{Y}_n - E(Y))}{\sqrt{\text{Var}(Y)}} \xrightarrow{d}
N(0, 1).
\]
Example: By the CLT, we have: \[ \frac{\sqrt{n}(\hat{p}_k - p_k)}{\sqrt{p_k(1 - p_k)}} \xrightarrow{d} N\left( 0, 1\right). \tag{1} \]
That is, for large \(n\), \[ \hat{p}_k \approx N\left( p_k, \, \frac{p_k(1 - p_k)}{n} \right) \]
\(n \hat{p}_k = \sum_{i=1}^{n} I_{\{X_i = k\}} \sim \text{Binomial}(n, p_k).\)
Moreover, \(\widehat{\text{Var}}(I_{\{X_i = k\}}) = \hat{p}_k (1 - \hat{p}_k) \xrightarrow{p} p_k (1 - p_k),\) by Convergence of random variables. Then using Slutsky’s theorem, this together with \[ \sqrt{n}(\hat{p}_k - p_k) \xrightarrow{d} N(0, p_k(1 - p_k)) \] yields \[ \frac{\sqrt{n}( \hat{p}_k - p_k )}{ \sqrt{ \hat{p}_k (1 - \hat{p}_k) } } \xrightarrow{d} N(0, 1). \tag{2} \]
The sample variance based on \(I_{\{X_i = k\}}, \ i = 1, \ldots, n\), is an unbiased estimator of the true variance \(\text{Var}(I_{\{X_i = k\}}) = p_k(1 - p_k).\) In contrast, \(\widehat{\text{Var}}(I_{\{X_i = k\}}) = \hat{p}_k (1 - \hat{p}_k)\) is a biased estimator, but it is consistent. Similarly, following the same reasoning as in Equation (2), we can derive an alternative version of the distributional convergence by substituting the population variance with the sample variance.
##
## ### Shapiro-Wilk p-values for max_1
##
## # A tibble: 4 × 3
## n `Z: Plug-in` `Z: Theoretical`
## <fct> <chr> <chr>
## 1 50 0.0000* 0.0000*
## 2 100 0.0000* 0.0000*
## 3 500 0.0000* 0.0053*
## 4 1000 0.0050* 0.0006*
##
## ### Shapiro-Wilk p-values for max_2
##
## # A tibble: 4 × 3
## n `Z: Plug-in` `Z: Theoretical`
## <fct> <chr> <chr>
## 1 50 0.0000* 0.0000*
## 2 100 0.0000* 0.0000*
## 3 500 0.0073* 0.1608
## 4 1000 0.0075* 0.2514
##
## ### Shapiro-Wilk p-values for max_3
##
## # A tibble: 4 × 3
## n `Z: Plug-in` `Z: Theoretical`
## <fct> <chr> <chr>
## 1 50 0.0000* 0.0000*
## 2 100 0.0000* 0.0015*
## 3 500 0.6937 0.3390
## 4 1000 0.0115* 0.0882
##
## ### Shapiro-Wilk p-values for max_4
##
## # A tibble: 4 × 3
## n `Z: Plug-in` `Z: Theoretical`
## <fct> <chr> <chr>
## 1 50 0.0000* 0.0000*
## 2 100 0.0003* 0.0006*
## 3 500 0.1226 0.1109
## 4 1000 0.1553 0.1235
Marginal Distribution
Order Statistics
## The covariance matrix:
## max_1 max_2 max_3 max_4
## max_1 0.000116 -0.000024 -0.000032 -0.000060
## max_2 -0.000024 0.000314 -0.000128 -0.000162
## max_3 -0.000032 -0.000128 0.000440 -0.000280
## max_4 -0.000060 -0.000162 -0.000280 0.000503
## # A tibble: 4 × 2
## n p_value
## <dbl> <chr>
## 1 50 0.0001*
## 2 100 0.0005*
## 3 500 0.3353
## 4 1000 0.2094
Central Limit Theorem (CLT) \(E(X)=\frac{25}{8}\) and \(Var(X)=\frac{55}{64}\)
Using the true variance: \(\frac{\sqrt{n}(\bar{X}_n - E(X))}{\sqrt{\text{Var}(X)}} \xrightarrow{d} N(0, 1)\)
Using the sample variance: \(\frac{\sqrt{n}(\bar{X}_n - E(X))}{\sqrt{S^2_n}} \xrightarrow{d} N(0, 1)\)
Sample Variance:
\[
S_n^2 = \frac{1}{n - 1} \sum_{i = 1}^{n} (Y_i - \bar{Y}_n)^2
\]
\(E(S_n^2) = \sigma^2\):an unbiased estimator of the population variance
\(\text{Var}(S_n^2) = \frac{1}{n} (\kappa - 1 + \frac{2}{n-1}) \sigma^4\), where \(\kappa\) is the kurtosis of the population distribution.
\(\text{Var}(S_n^2) \approx \frac{1}{n} (\kappa - 1) \sigma^4\).
\(S_n^2\) is a consistent estimator of the population variance \(\sigma^2\); that is, \(S_n^2 \xrightarrow{p} \sigma^2\) as \(n \to \infty\).
\(\sqrt{n}(S_n^2 - \sigma^2) \xrightarrow{d} N\left( 0, (\kappa - 1) \sigma^4 \right)\) (試著證看看,亦可用模擬驗證)
For the normal distribution, we have:
A biased sample variance: \[ {S'}_n^2 = \frac{1}{n} \sum_{i = 1}^{n} (Y_i - \bar{Y}_n)^2 \]
\(E({S'}_n^2) = \frac{n - 1}{n} \sigma^2 < \sigma^2\)
This estimator is biased, but consistent as \(n \to \infty\).
Sample Standard Deviation: \(S_n = \sqrt{S_n^2}\)
\(E(S_n) < \sigma\):a biased estimator of \(\sigma\)
For the normal distribution, an unbiased estimator of \(\sigma\) is given by: \[ \hat{\sigma} = \frac{S_n}{c_4(n)} \quad \text{where } c_4(n) = \sqrt{\frac{2}{n-1}} \cdot \frac{\Gamma(n/2)}{\Gamma((n-1)/2)} \]
See Wikipedia: Unbiased estimation of standard deviation for details.
問題思考: 若骰子不公平,那麼:
\(P(X = k)\) 會是什麼樣的分佈呢?期望值、變異數又會是什麼樣的數值呢?
除了手算外,你還可以怎麼算?